Skip to content

[QUARK-479] Fix per-layer quant dtype in DeepSeek attention init#268

Open
thpereir wants to merge 2 commits intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc
Open

[QUARK-479] Fix per-layer quant dtype in DeepSeek attention init#268
thpereir wants to merge 2 commits intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc

Conversation

@thpereir
Copy link
Contributor

@thpereir thpereir commented Mar 4, 2026

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype'].

  • Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
  • Use resolved dtype for FP4/FP8 decision in attention init
  • Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
  • Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

Depends on #236

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

thpereir added 2 commits March 6, 2026 16:03
For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the
attention block must resolve its own per-layer quant spec rather than
using the global quant_config['quant_dtype'].

- Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
- Use resolved dtype for FP4/FP8 decision in attention init
- Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
- Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.
Instead of checking the global quant_dtype to decide whether to bypass
FP4 quantization for MTP layers, use quant_config.resolve(prefix) to
check the per-layer spec. This correctly preserves FP8 quantization
for MTP layer 61 when the global config is MXFP4 but the layer has an
FP8 per_Token override (as in the PTPC model format).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant